Twitter Network Analysis of the California Camp Fire

Alex John Quijano$^{-}$, Maia Powell$^{-}$, Matthew Mondares$^{+}$

$^{-}$University of California Merced, Applied Mathematics

$^{+}$University of California Merced, Management and Complex Systems

This is a project documentation for the Global Good Studio, COGS-269, University of California Merced.

Links to the interactive networks.

  1. 2018 User Network
  2. 2019 User Network
  3. 2018 Hashtag Network
  4. 2019 Hashtag Network
In [16]:
# import required modules
import os
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib
font = {'size': 20}
matplotlib.rc('font',**font)
import matplotlib.pyplot as plt
from matplotlib import cm
import myUtilities as mu
try:
    os.mkdir('figures')
except:
    pass
import plotly
import plotly.graph_objects as go
import math

1. Dataset.

The data is scraped using a scraper provided by the Github user jonbakerfish and it was processed by Github user stressosaurus. The scraper scraped a set of tweets in November 2018 and November 2019 using a set of general keywords listed below. To get more tweets related to the tweets scraped using the below keywords, a second scraping task is performed using the Twitter API. In this scraping method, twitter information such as the parent tweet - if the tweet is a reply - is retrieved. A depth-first search algorithm is applied to the scraper to retrieve these tweets.

General Keywords - Fire related

bushfire
bushfires
conflagration
conflagrations
arson
arsons
smolder
smolders
smoldered
firebreak
firebreaks
blaze
blazed
burn
burns
burned
firestorm
firestorms
campfire
campfires
flame
flames
flamed
bonfire
bonfires
heat
heats
heated
flare
flares
flared

To further get the tweets related to the California campfire, data subsetting is performed using the following keywords. All related tweets are also included to the subset. Related tweets includes tweets with replies, the replies, and any cooccurring hashtags.

#campfire
#campfires
#buttecounty
#chico
#campfirepets
#paradise
#bushfires
#magalia
#campfireparadise
#buttestrong
#climatechange
#woolseyfire
#paradisestrong
#campfirejameswoods
#oroville
#paradiseca
#concow
#californiafires
#buttecountyfires
#cafires
#paradisefires
#cawx
#californiastrong
#californiawildfires
#buttecountystrong
#californiastrong
#wildfire
#wildfires
#hillfire
#hillfires
#disasterassistteam
#bushfire
#bushfires
bushfire
bushfires
wildfire
wildfires
campfire
campfires

In the next subsection, we describe the data structures of the scraped tweets.

1.1. Load Datasets.

The below two blocks of code opens two sets of data. First the tweets data stored in "T" variables and the user data stored in the "U" variable.

In [17]:
# twitter information
T = pd.DataFrame(np.load('data-subset/CAF-words-fire-related-words-tweets.npy',
                         allow_pickle=True).item())
# examples
print('This shows two examples from the data.')
print()
T.head(2)
This shows two examples from the data.

Out[17]:
day favorite_count hashtags hour language minute month parent_tweet_id quoted_tweet_id retweet_count second text time_zone urls user_id user_screen_name usermentions with_image year
1067752277266464770 28 15 #LNG,#fires,#flaring,#Gas,#qldpol 12 en 9 11 * * 24 15 Are #LNG plants safe from the regions #fires? ... +0000 * 4031994734 [W-USN6880] [W-USN133],[W-USN6881],[W-USN5432] True 2018
1068026533883740160 29 3 * 6 en 19 11 1067942509639163904 1068025742942863360 0 3 Large parts of Queensland remain under siege t... +0000 https://twitter.com/7NewsBrisbane/status/10680... 74382140 [W-USN4724] * False 2018
In [18]:
# user information 
# user Twitter handles are patched and you need the user file to get the actual user handle
# uncomment the following lines
U = np.load('data-subset/CAF-words-fire-related-words-users.npy',
            allow_pickle=True).item()

# examples
print('This shows an example of a user information and the associated key in the data.')
print()
print(U['key']['Alyssa_Milano'])
print(U['key']['[W-USN1783]'])
print(U['information']['Alyssa_Milano'])
This shows an example of a user information and the associated key in the data.

[W-USN1783]
Alyssa_Milano
{'user_id': 26642006, 'year': 2009, 'month': 3, 'day': 26, 'hour': 0, 'minute': 34, 'second': 20, 'user_number_of_followers': 3686733, 'user_location': 'Los Angeles', 'patch': '[W-USN1783]'}

1.2. Load Networks.

1.2.1. Defining the User Cooccurrence (or User-User) Network.

The user-user network is an undirected network, showing interactions between users via @ mentions and replies. Consequently, each node represents a single Twitter user and each edge is an interaction.

1.2.2. Defining the Hashtag Cooccurrence Network.

The hashtag coccurence network is an undirected network. In other words, if a tweet contains two hashtags $a$ and $b$, then they coccur. Therefore, the nodes represent individual hashtags and the edges represent their coccurence.

The following code block opens the networks for the user cooccurrence and the hashtag cooccurence for November 2018 and November 2019. The resulting networks have multiple components or subnetworks. That is each component is a connected subnetwork within the overall network; components are disconnected.

In [19]:
# full networks - networkx data structure
USN_G_112018 = nx.read_gpickle('data-networks/USN-nx-112018.gpickle')
HTGS_G_112018 = nx.read_gpickle('data-networks/HTGS-nx-112018.gpickle')
USN_G_112019 = nx.read_gpickle('data-networks/USN-nx-112019.gpickle')
HTGS_G_112019 = nx.read_gpickle('data-networks/HTGS-nx-112019.gpickle')

# full network in components - networkx data structure
USN_G_112018_C = nx.read_gpickle('data-networks/USN-nx-112018-comps.gpickle')
HTGS_G_112018_C = nx.read_gpickle('data-networks/HTGS-nx-112018-comps.gpickle')
USN_G_112019_C = nx.read_gpickle('data-networks/USN-nx-112019-comps.gpickle')
HTGS_G_112019_C = nx.read_gpickle('data-networks/HTGS-nx-112019-comps.gpickle')

2. User Bot Values.

The users are classified using the Botometer model by OSoMe with Github repository botometer. The botometer uses a twitter API to compute a number that tells us if a user is a bot or not. In this project, we use the display scores and the complete automation probability, the probability that a user's tweets are automated. The displays scores are ranged from $0$ to $5$ where $5$ means a user is more likely a bot while $0$ means a user is less likely a bot. To assign one value for each user, we compute the bot score which is the linear combination of the two scores that is given by

$$b = 5 \frac{1}{2} \text{(complete automation probability)} + \frac{1}{2} \text{(display score)}$$

where the resulting bot score is a continuous number from $1$ to $5$. To separate the bot scores into $5$ discrete categories, we use the function below and call it the bot value.

$$ f(\text{b})= \begin{cases} 1 \hspace{10px}\text{ if }\hspace{10px} 0 \le b < 1 \text{ (user is not a bot) } \\ 2 \hspace{10px}\text{ if }\hspace{10px} 1 \le b < 2 \\ 3 \hspace{10px}\text{ if }\hspace{10px} 2 \le b < 3 \\ 4 \hspace{10px}\text{ if }\hspace{10px} 3 \le b < 4 \\ 5 \hspace{10px}\text{ if }\hspace{10px} 4 \le b \le 5 \text{ (user is a bot) }\\ \end{cases}. $$

3. Centrality Measures.

2.1. The Eigenvector Centrality.

Eigenvector centrality provides a metric for influence. A node is important, or more influential, if it is connected to other important nodes.

Definition. Let $A = (a_{i,j})$ be the adjacency matrix of a graph, where $a_{i,j} = 1$ if nodes $i$ and $j$ are connected, and $a_{i,j} = 0$ elsewhere. We then compute the eigenvalues $\lambda_1, \lambda_2, \cdots, \lambda_i$ of $A$, select $\max_{i} |\lambda_i| = \lambda_{max}$, and find its corresponding eigenvector $\vec{x}_{\lambda_{max}}$. The eigenvector centrality of a node $i$ is thus the $i^{th}$ component of $\vec{x}_{\lambda_{max}}$.

2.2. The Betweenness Centrality.

Betweenness centrality measures the extent to which a vertex lies on paths between other vertices. High betweeness of a node then implies it has influence over other nodes as a result of its control over the transmission of information throughout the network.

Definition. The betweeness centrality $\beta$ of a node $a$ is $$\beta(a) = \sum_{a \neq b \neq c} \frac{\sigma_{bc}(a)}{\sigma_{bc}}$$ where $\sigma_{bc}$ denotes the total number of geodesic paths between nodes $b$ and $c$ and $\sigma_{bc}(a)$ denotes the number of those paths between that contain $a$.

3. Visualizing the Networks.

3.1. November 2018 User Co-occurrence (User-User) Network.
In [20]:
# User Bot Distribution
B5CAT_vect = []
colormap_seismic = cm.get_cmap('RdYlBu')
B5CAT_color = list(reversed([colormap_seismic(i) for i in np.linspace(0,1,5)]))
for i in USN_G_112018.nodes():
    B5CAT_vect.append(USN_G_112018.nodes[i]['bot_5cat'])
u, c = np.unique(B5CAT_vect,return_counts=True)
c_results = {'':c}

def survey(results, category_names):
    """
    Parameters
    ----------
    results : dict
        A mapping from question labels to a list of answers per category.
        It is assumed all lists contain the same number of entries and that
        it matches the length of *category_names*.
    category_names : list of str
        The category labels.
    """
    labels = list(results.keys())
    data = np.array(list(results.values()))
    data_cum = data.cumsum(axis=1)
    category_colors = plt.get_cmap('RdYlGn')(
        np.linspace(0.15, 0.85, data.shape[1]))

    fig, ax = plt.subplots(figsize=(14, 3))
    ax.invert_yaxis()
    ax.xaxis.set_visible(False)
    ax.set_xlim(0, np.sum(data, axis=1).max())

    for i, (colname, color) in enumerate(zip(category_names, B5CAT_color)):
        widths = data[:, i]
        starts = data_cum[:, i] - widths
        
        ax.barh(labels, widths, left=starts, height=0.5,
                label=str(colname)+' (users='+str(int(widths[0]))+')', color=color)
        xcenters = starts + widths / 2

        r, g, b, _ = color
        text_color = 'white' if r * g * b < 0.5 else 'darkgrey'
        for y, (x, c) in enumerate(zip(xcenters, widths)):
            ax.text(x, y,'', ha='center', va='center',
                    color=text_color)
    ax.legend(bbox_to_anchor=(0, 1.02,1,.102),
              loc=3, fontsize="small",mode='expand',ncol=5)

    return fig, ax

fig, ax = survey(c_results, u)
ax.set_title('user cooccurrence network \n combined bot value distribution November 2018 \n\n')
plt.tight_layout()
plt.savefig('figures/USN-combinedBotValueDistribution-112018.png')
plt.show()
In [21]:
# users cooccurrence frequency distributions
freq_vect = {'cluster':[],'edge':[],'frequency':[]}
for j, i in enumerate(USN_G_112018_C):
    for k in i.edges():
        freq_vect['cluster'].append(j)
        freq_vect['edge'].append((U['key'][k[0]],U['key'][k[1]]))
        freq_vect['frequency'].append(i.edges[k]['frequency'])
freq_vect = pd.DataFrame(freq_vect).set_index('edge').sort_values(by='frequency',ascending=False)

# list top hashtag frequency values
print('Top edges in November 2018')
print(freq_vect.head(20))
Top edges in November 2018
                                   cluster  frequency
edge                                                 
(EthonRaptor, GillesnFio)                0         51
(PolAnimalAus, EthonRaptor)              0         48
(EthonRaptor, dvibrationz)               0         48
(PolAnimalAus, dvibrationz)              0         48
(PolAnimalAus, GillesnFio)               0         48
(dvibrationz, GillesnFio)                0         48
(EthonRaptor, swcrisis)                  0         32
(WWF, POTUS)                             0         30
(Tangomitteckel, GillesnFio)             0         30
(latimes, POTUS)                         0         30
(EthonRaptor, Tangomitteckel)            0         30
(EthonRaptor, 3GHtweets)                 0         29
(badmoonrising11, GillesnFio)            0         28
(EthonRaptor, badmoonrising11)           0         28
(Tangomitteckel, badmoonrising11)        0         28
(latimes, WWF)                           0         28
(swcrisis, badmoonrising11)              0         27
(EthonRaptor, KIVUNature)                0         27
(swcrisis, GillesnFio)                   0         27
(Tangomitteckel, swcrisis)               0         27
In [22]:
# user centralities distributions
centrality_vect = {'cluster':[],'user':[],'eig':[],'bet':[],'deg':[]}
for j, i in enumerate(USN_G_112018_C):
    for k in i.nodes():
        centrality_vect['cluster'].append(j)
        centrality_vect['user'].append(U['key'][k])
        centrality_vect['eig'].append(i.nodes[k]['centrality'])
        centrality_vect['bet'].append(i.nodes[k]['betweenness'])
        centrality_vect['deg'].append(i.nodes[k]['degree'])
centrality_vect = pd.DataFrame(centrality_vect).set_index('user')

# list top eigenvector centrality values
print('Top user eigenvector centralities in November 2018 by component')
centrality_vect = centrality_vect.sort_values(by='eig',ascending=False)
print(centrality_vect.head(20))
print()
print('Top user betweenness centralities in November 2018 by component')
centrality_vect = centrality_vect.sort_values(by='bet',ascending=False)
print(centrality_vect.head(20))
print()
print('Top user degree centralities in November 2018 by component')
centrality_vect = centrality_vect.sort_values(by='deg',ascending=False)
print(centrality_vect.head(20))
Top user eigenvector centralities in November 2018 by component
                 cluster       eig       bet       deg
user                                                  
3GHtweets              0  0.171419  0.001548  0.081756
LifeIsThermal          0  0.171419  0.001548  0.081756
FriendsOScience        0  0.171419  0.001548  0.081756
6esm                   0  0.171419  0.001548  0.081756
GillesnFio             0  0.171419  0.001548  0.081756
EthonRaptor            0  0.171419  0.001548  0.081756
dvibrationz            0  0.107255  0.000174  0.049053
PolAnimalAus           0  0.107255  0.000174  0.049053
NikolovScience         0  0.102231  0.000024  0.046472
swcrisis               0  0.102231  0.000024  0.046472
JaggerMickOZ           0  0.100447  0.000018  0.045611
EcoSenseNow            0  0.098728  0.000009  0.044750
badmoonrising11        0  0.098728  0.000009  0.044750
Tangomitteckel         0  0.098728  0.000009  0.044750
KIVUNature             0  0.098728  0.000009  0.044750
dan613                 0  0.098442  0.000043  0.044750
rln_nelson             0  0.098442  0.000043  0.044750
SylviaD32911201        0  0.096592  0.000024  0.043890
nytimes                0  0.095076  0.014657  0.049914
DamianMcColl           0  0.094912  0.000000  0.043029

Top user betweenness centralities in November 2018 by component
                 cluster            eig       bet       deg
user                                                       
realDonaldTrump        0   1.953294e-03  0.017665  0.030981
nytimes                0   9.507644e-02  0.014657  0.049914
ALT_uscis              0   3.766669e-05  0.009193  0.035284
manamiangry            1   9.457198e-04  0.007925  0.086059
RepSwalwell            0   3.898714e-05  0.002253  0.010327
globalist13903         1   1.585561e-03  0.001834  0.085198
AiNaTow                1   1.585561e-03  0.001834  0.085198
TTownJoe               1   1.585561e-03  0.001834  0.085198
BamaDan78              1   1.585561e-03  0.001834  0.085198
3GHtweets              0   1.714188e-01  0.001548  0.081756
FriendsOScience        0   1.714188e-01  0.001548  0.081756
6esm                   0   1.714188e-01  0.001548  0.081756
GillesnFio             0   1.714188e-01  0.001548  0.081756
EthonRaptor            0   1.714188e-01  0.001548  0.081756
LifeIsThermal          0   1.714188e-01  0.001548  0.081756
KyleKashuv             0   3.677164e-05  0.001440  0.006885
latimes                0   7.436315e-05  0.001431  0.008606
WWF                    0   1.937972e-03  0.001293  0.009466
ScottMorrisonMP        5  4.747971e-321  0.001109  0.018072
BarackObama            0   1.937264e-03  0.001081  0.008606

Top user degree centralities in November 2018 by component
                 cluster           eig           bet       deg
user                                                          
manamiangry            1  9.457198e-04  7.925389e-03  0.086059
globalist13903         1  1.585561e-03  1.834396e-03  0.085198
AiNaTow                1  1.585561e-03  1.834396e-03  0.085198
TTownJoe               1  1.585561e-03  1.834396e-03  0.085198
BamaDan78              1  1.585561e-03  1.834396e-03  0.085198
EthonRaptor            0  1.714188e-01  1.547776e-03  0.081756
3GHtweets              0  1.714188e-01  1.547776e-03  0.081756
FriendsOScience        0  1.714188e-01  1.547776e-03  0.081756
6esm                   0  1.714188e-01  1.547776e-03  0.081756
GillesnFio             0  1.714188e-01  1.547776e-03  0.081756
LifeIsThermal          0  1.714188e-01  1.547776e-03  0.081756
nytimes                0  9.507644e-02  1.465737e-02  0.049914
dvibrationz            0  1.072547e-01  1.743363e-04  0.049053
PolAnimalAus           0  1.072547e-01  1.743363e-04  0.049053
TheophilusPrime        1  2.039371e-04  6.700853e-04  0.046472
swcrisis               0  1.022308e-01  2.420605e-05  0.046472
NikolovScience         0  1.022308e-01  2.420605e-05  0.046472
JaggerMickOZ           0  1.004466e-01  1.764309e-05  0.045611
KAT40811334            3  9.470953e-10  2.124397e-07  0.045611
Jingoman111            3  9.470953e-10  2.124397e-07  0.045611
In [23]:
G = USN_G_112018
pos = nx.spring_layout(G) # obtain positions for each node in the network 

### Interactive plot
edge_x = []
edge_y = []
for edge in G.edges():
    x0,y0 = pos[edge[0]]
    x1,y1 = pos[edge[1]]
    edge_x.append(x0)
    edge_x.append(x1)
    edge_x.append(None)
    edge_y.append(y0)
    edge_y.append(y1)
    edge_y.append(None)

# Creating a "scatter plot" of the edges
edge_trace = go.Scatter(
    x=edge_x, y=edge_y,
    line=dict(width=0.75, color='slategray'), # change the thickness and color of the edges
    hoverinfo='none',
    opacity = 0.5,
    mode='lines',
    showlegend=False)

# Creating the nodes, based on positions
node_x = []
node_y = []
for node in G.nodes():
    x, y = pos[node]
    node_x.append(x)
    node_y.append(y)

# Creating a scatter plot of the nodes
node_trace = go.Scatter(
    x=node_x, y=node_y,
    mode='markers',
    hoverinfo='text',
    marker=dict(
        #showscale=True,
        # colorscale options
        #'Greys' | 'YlGnBu' | 'Greens' | 'YlOrRd' | 'Bluered' | 'RdBu' |
        #'Reds' | 'Blues' | 'Picnic' | 'Rainbow' | 'Portland' | 'Jet' |
        #'Hot' | 'Blackbody' | 'Earth' | 'Electric' | 'Viridis' |
        #colorscale='RdYlBu',m
        colorscale = [[0,'rgb(42,35,160)'], [0.25,'rgb(29,145,192)'], [0.5,'rgb(254,227,145)'], [0.75,'rgb(241,105,19)'], [1.0, 'rgb(227,26,28)']],
        line_width=0.5),
        showlegend=False)

node_5bot = []
node_text_5bot = []
for node in G.nodes():
    node_5bot.append(G.nodes[node]['bot_5cat'])
    node_text_5bot.append('Bot Score: '+str(G.nodes[node]['bot_5cat']))
count1, count2 = np.unique(node_5bot, return_counts=True)

node_eig = []
node_text_eig = []
node_b = []
for node in G.nodes():
    node_eig.append(round(G.nodes[node]['centrality'],4))
    node_b.append(round(G.nodes[node]['betweenness'],4))
    node_text_eig.append('User: '+str(U['key'][node])+', Influence: '
                         +str(round(G.nodes[node]['centrality'],4))
                         +', Betweenness: '+str(round(G.nodes[node]['betweenness'],4))
                         +', Degree: '+str(round(G.nodes[node]['degree'],4)))

node_trace.marker.color = node_5bot
node_eig3 = [10 + i*100 for i in node_eig]
node_trace.marker.size = node_eig3
node_trace.text = node_text_eig

text = 'Plot of the largest connected subnetwork, <br>\
        displaying values of Bot Score (color of node), <br>\
        eigenvector centality (proportional to the size of node), <br>\
        betweeness centrality (hover mouse), <br>\
        and degree centrality (hover mouse).'

#Creating the figure 
fig = go.Figure(data = [edge_trace, node_trace],
             layout=go.Layout(
                title='User Network (November 2018)',
                titlefont_size=24,
                showlegend=True,
                plot_bgcolor = 'rgb(224,243,219)',
                hovermode='closest',
                margin=dict(b=20,l=5,r=5,t=40),
                annotations=[ dict(
                    text=text,
                    showarrow=False,
                    xref="paper", yref="paper",
                    align="left",
                    x=0.005, y=-0.002 ) ],
                xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
                yaxis=dict(showgrid=False, zeroline=False, showticklabels=False))
                )

fig.add_trace(go.Scatter(
    x=[0.005],
    y=[-0.002],
    #visible = False,
    showlegend=True,
    mode='markers',
    marker = dict(color = 'rgb(42,35,160)', size =0.1),
    name="1"       # this sets its legend entry
))


fig.add_trace(go.Scatter(
    x=[0.005],
    y=[-0.002],
    #visible = False,
    showlegend=True,
    mode='markers',
    marker = dict(color = 'rgb(29,145,192)', size =0.1),
    name="2"
))

fig.add_trace(go.Scatter(
    x=[0.005],
    y=[-0.002],
    #visible = False,
    showlegend=True,
    mode='markers',
    marker = dict(color = 'rgb(254,227,145)', size =0.1),
    name="3"
))

fig.add_trace(go.Scatter(
    x=[0.005],
    y=[-0.002],
    #visible = False,
    showlegend=True,
    mode='markers',
    marker = dict(color = 'rgb(241,105,19)', size =0.1),
    name="4"
))

fig.add_trace(go.Scatter(
    x=[0.005],
    y=[-0.002],
    #visible = False,
    showlegend=True,
    mode='markers',
    marker = dict(color = 'rgb(227,26,28)', size =0.1),
    name="5"
))

fig.update_layout(legend= dict(itemsizing='constant', itemclick='toggleothers', bgcolor='rgb(224,243,219)'))


fig.update_layout(legend_title='<b> Bot Score </b>')

fig.show()

plotly.offline.plot(fig, filename = 'figures/USN-network-112018-unpatched.html', auto_open=False)
Out[23]:
'figures/USN-network-112018-unpatched.html'
3.2. November 2019 User Co-occurrence (User) Network.
In [24]:
# User Bot Distribution
B5CAT_vect = []
colormap_seismic = cm.get_cmap('RdYlBu')
B5CAT_color = list(reversed([colormap_seismic(i) for i in np.linspace(0,1,5)]))
for i in USN_G_112019.nodes():
    B5CAT_vect.append(USN_G_112019.nodes[i]['bot_5cat'])
u, c = np.unique(B5CAT_vect,return_counts=True)
c_results = {'':c}

def survey(results, category_names):
    """
    Parameters
    ----------
    results : dict
        A mapping from question labels to a list of answers per category.
        It is assumed all lists contain the same number of entries and that
        it matches the length of *category_names*.
    category_names : list of str
        The category labels.
    """
    labels = list(results.keys())
    data = np.array(list(results.values()))
    data_cum = data.cumsum(axis=1)
    category_colors = plt.get_cmap('RdYlGn')(
        np.linspace(0.15, 0.85, data.shape[1]))

    fig, ax = plt.subplots(figsize=(14, 3))
    ax.invert_yaxis()
    ax.xaxis.set_visible(False)
    ax.set_xlim(0, np.sum(data, axis=1).max())

    for i, (colname, color) in enumerate(zip(category_names, B5CAT_color)):
        widths = data[:, i]
        starts = data_cum[:, i] - widths
        
        ax.barh(labels, widths, left=starts, height=0.5,
                label=str(colname)+' (users='+str(int(widths[0]))+')', color=color)
        xcenters = starts + widths / 2

        r, g, b, _ = color
        text_color = 'white' if r * g * b < 0.5 else 'darkgrey'
        for y, (x, c) in enumerate(zip(xcenters, widths)):
            ax.text(x, y,'', ha='center', va='center',
                    color=text_color)
    ax.legend(bbox_to_anchor=(0, 1.02,1,.102),
              loc=3, fontsize="small",mode='expand',ncol=5)

    return fig, ax

fig, ax = survey(c_results, u)
ax.set_title('user cooccurrence network \n combined bot value distribution November 2019 \n\n')
plt.tight_layout()
plt.savefig('figures/USN-combinedBotValueDistribution-112019.png')
plt.show()
In [25]:
# users cooccurrence frequency distributions
freq_vect = {'cluster':[],'edge':[],'frequency':[]}
for j, i in enumerate(USN_G_112019_C):
    for k in i.edges():
        freq_vect['cluster'].append(j)
        freq_vect['edge'].append((U['key'][k[0]],U['key'][k[1]]))
        freq_vect['frequency'].append(i.edges[k]['frequency'])
freq_vect = pd.DataFrame(freq_vect).set_index('edge').sort_values(by='frequency',ascending=False)

# list top hashtag frequency values
print('Top edges in November 2019')
print(freq_vect.head(20))
Top edges in November 2019
                               cluster  frequency
edge                                             
(Jeeneree, Upst8Downst8)             2         55
(SteveJaneski, CPresser_2)           2         55
(PatBrigman, ResistingLib)           2         55
(PatBrigman, marcynorsk)             2         55
(PatBrigman, SafiyahNoor1)           2         55
(PatBrigman, CPresser_2)             2         55
(PatBrigman, sis_boom_baaah)         2         55
(PatBrigman, bluedgal)               2         55
(PatBrigman, pelotonattacker)        2         55
(PatBrigman, drjjr500)               2         55
(PatBrigman, bdorbin)                2         55
(PatBrigman, dlspace108)             2         55
(PatBrigman, BetsyGervasi)           2         55
(PatBrigman, IsabellaAmore47)        2         55
(PatBrigman, SamCatClemens)          2         55
(PatBrigman, john44909381)           2         55
(dlspace108, ella_arson)             2         55
(dlspace108, janforney1)             2         55
(dlspace108, retiredfirstsgt)        2         55
(PatBrigman, wildwillow65)           2         55
In [26]:
# user centralities distributions
centrality_vect = {'cluster':[],'user':[],'eig':[],'bet':[],'deg':[]}
for j, i in enumerate(USN_G_112019_C):
    for k in i.nodes():
        centrality_vect['cluster'].append(j)
        centrality_vect['user'].append(U['key'][k])
        centrality_vect['eig'].append(i.nodes[k]['centrality'])
        centrality_vect['bet'].append(i.nodes[k]['betweenness'])
        centrality_vect['deg'].append(i.nodes[k]['degree'])
centrality_vect = pd.DataFrame(centrality_vect).set_index('user')

# list top eigenvector centrality values
print('Top user eigenvector centralities in November 2019 by component')
centrality_vect = centrality_vect.sort_values(by='eig',ascending=False)
print(centrality_vect.head(20))
print()
print('Top user betweenness centralities in November 2019 by component')
centrality_vect = centrality_vect.sort_values(by='bet',ascending=False)
print(centrality_vect.head(20))
print()
print('Top user degree centralities in November 2019 by component')
centrality_vect = centrality_vect.sort_values(by='deg',ascending=False)
print(centrality_vect.head(20))
Top user eigenvector centralities in November 2019 by component
                 cluster       eig       bet       deg
user                                                  
POTUS                  1  0.127732  0.002274  0.091320
WWF                    1  0.127613  0.001182  0.086799
Tesla                  1  0.127580  0.000501  0.087703
latimes                1  0.127556  0.000287  0.086799
ktlaENT                1  0.127533  0.000069  0.085895
CarterLibrary          1  0.127533  0.000069  0.085895
EnviroAction           1  0.127533  0.000069  0.085895
SierraClub             1  0.127533  0.000069  0.085895
MotherNatureNet        1  0.127533  0.000069  0.085895
KABCRadio              1  0.127533  0.000069  0.085895
FLOTUS                 1  0.127533  0.000069  0.085895
hrhprincesshaya        1  0.127533  0.000069  0.085895
SanBernardinoNF        1  0.127533  0.000069  0.085895
Chrysler               1  0.127533  0.000069  0.085895
jayleno                1  0.127533  0.000069  0.085895
ArchDigest             1  0.127533  0.000069  0.085895
Chevron                1  0.127533  0.000069  0.085895
cartalk                1  0.127533  0.000069  0.085895
KAcom                  1  0.127533  0.000069  0.085895
HRHPrincessKK          1  0.127533  0.000069  0.085895

Top user betweenness centralities in November 2019 by component
                 cluster           eig       bet       deg
user                                                      
BetsyGervasi           2  6.738923e-09  0.004681  0.092224
realDonaldTrump        1  5.959547e-03  0.004124  0.015371
DawnTJ90               0  9.842841e-05  0.003488  0.141953
POTUS                  1  1.277320e-01  0.002274  0.091320
GillesnFio             0  8.072825e-05  0.001849  0.106691
SpeakerPelosi          1  7.991202e-02  0.001845  0.049729
ScottMorrisonMP        3  1.992316e-45  0.001740  0.016275
noplaceforsheep        3  9.091269e-45  0.001350  0.022604
JaggerMickOZ           0  6.834183e-05  0.001203  0.091320
WWF                    1  1.276128e-01  0.001182  0.086799
JuliePi31415926        0  4.437419e-05  0.001118  0.088608
RoyPentland            0  4.437419e-05  0.001118  0.088608
GladysB                3  6.066360e-45  0.001110  0.014467
GOP                    1  8.061413e-05  0.001065  0.004521
ewarren                1  1.210688e-03  0.001056  0.008137
dbongino               1  8.061496e-05  0.000852  0.004521
DonaldJTrumpJr         1  8.061539e-05  0.000851  0.004521
EthonRaptor            0  8.554144e-05  0.000829  0.102170
NHLFlames              4  5.366232e-38  0.000775  0.019892
NikolovScience         0  8.504007e-05  0.000768  0.100362

Top user degree centralities in November 2019 by component
                 cluster           eig       bet       deg
user                                                      
DawnTJ90               0  9.842841e-05  0.003488  0.141953
GillesnFio             0  8.072825e-05  0.001849  0.106691
EthonRaptor            0  8.554144e-05  0.000829  0.102170
Kenneth72712993        0  8.504007e-05  0.000768  0.100362
NikolovScience         0  8.504007e-05  0.000768  0.100362
BetsyGervasi           2  6.738923e-09  0.004681  0.092224
POTUS                  1  1.277320e-01  0.002274  0.091320
JaggerMickOZ           0  6.834183e-05  0.001203  0.091320
swcrisis               0  7.575872e-05  0.000548  0.089512
RoyPentland            0  4.437419e-05  0.001118  0.088608
FrankWi74044551        0  7.488928e-05  0.000540  0.088608
JuliePi31415926        0  4.437419e-05  0.001118  0.088608
Tesla                  1  1.275799e-01  0.000501  0.087703
WWF                    1  1.276128e-01  0.001182  0.086799
latimes                1  1.275563e-01  0.000287  0.086799
forestservice          1  1.275333e-01  0.000069  0.085895
IENearth               1  1.275333e-01  0.000069  0.085895
DDispatchNews          1  1.275333e-01  0.000069  0.085895
guardianeco            1  1.275333e-01  0.000069  0.085895
CarterLibrary          1  1.275333e-01  0.000069  0.085895
In [27]:
G = USN_G_112019
pos = nx.spring_layout(G) # obtain positions for each node in the network 

### Interactive plot
edge_x = []
edge_y = []
for edge in G.edges():
    x0,y0 = pos[edge[0]]
    x1,y1 = pos[edge[1]]
    edge_x.append(x0)
    edge_x.append(x1)
    edge_x.append(None)
    edge_y.append(y0)
    edge_y.append(y1)
    edge_y.append(None)

# Creating a "scatter plot" of the edges
edge_trace = go.Scatter(
    x=edge_x, y=edge_y,
    line=dict(width=0.75, color='slategray'), # change the thickness and color of the edges
    hoverinfo='none',
    opacity = 0.5,
    mode='lines',
    showlegend=False)

# Creating the nodes, based on positions
node_x = []
node_y = []
for node in G.nodes():
    x, y = pos[node]
    node_x.append(x)
    node_y.append(y)

# Creating a scatter plot of the nodes
node_trace = go.Scatter(
    x=node_x, y=node_y,
    mode='markers',
    hoverinfo='text',
    marker=dict(
        #showscale=True,
        # colorscale options
        #'Greys' | 'YlGnBu' | 'Greens' | 'YlOrRd' | 'Bluered' | 'RdBu' |
        #'Reds' | 'Blues' | 'Picnic' | 'Rainbow' | 'Portland' | 'Jet' |
        #'Hot' | 'Blackbody' | 'Earth' | 'Electric' | 'Viridis' |
        #colorscale='RdYlBu',m
        colorscale = [[0,'rgb(42,35,160)'], [0.25,'rgb(29,145,192)'], [0.5,'rgb(254,227,145)'], [0.75,'rgb(241,105,19)'], [1.0, 'rgb(227,26,28)']],
        line_width=0.5),
        showlegend=False)

node_5bot = []
node_text_5bot = []
for node in G.nodes():
    node_5bot.append(G.nodes[node]['bot_5cat'])
    node_text_5bot.append('Bot Score: '+str(G.nodes[node]['bot_5cat']))
count1, count2 = np.unique(node_5bot, return_counts=True)

node_eig = []
node_text_eig = []
node_b = []
for node in G.nodes():
    node_eig.append(round(G.nodes[node]['centrality'],4))
    node_b.append(round(G.nodes[node]['betweenness'],4))
    node_text_eig.append('User: '+str(U['key'][node])+', Influence: '
                         +str(round(G.nodes[node]['centrality'],4))
                         +', Betweenness: '+str(round(G.nodes[node]['betweenness'],4))
                         +', Degree: '+str(round(G.nodes[node]['degree'],4)))

node_trace.marker.color = node_5bot
node_eig3 = [10 + i*100 for i in node_eig]
node_trace.marker.size = node_eig3
node_trace.text = node_text_eig

text = 'Plot of the largest connected subnetwork, <br>\
        displaying values of Bot Score (color of node), <br>\
        eigenvector centality (proportional to the size of node), <br>\
        betweeness centrality (hover mouse), <br>\
        and degree centrality (hover mouse).'

#Creating the figure 
fig = go.Figure(data = [edge_trace, node_trace],
             layout=go.Layout(
                title='User Network (November 2019)',
                titlefont_size=24,
                showlegend=True,
                plot_bgcolor = 'rgb(224,243,219)',
                hovermode='closest',
                margin=dict(b=20,l=5,r=5,t=40),
                annotations=[ dict(
                    text=text,
                    showarrow=False,
                    xref="paper", yref="paper",
                    align="left",
                    x=0.005, y=-0.002 ) ],
                xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
                yaxis=dict(showgrid=False, zeroline=False, showticklabels=False))
                )

fig.add_trace(go.Scatter(
    x=[0.005],
    y=[-0.002],
    #visible = False,
    showlegend=True,
    mode='markers',
    marker = dict(color = 'rgb(42,35,160)', size =0.1),
    name="1"       # this sets its legend entry
))


fig.add_trace(go.Scatter(
    x=[0.005],
    y=[-0.002],
    #visible = False,
    showlegend=True,
    mode='markers',
    marker = dict(color = 'rgb(29,145,192)', size =0.1),
    name="2"
))

fig.add_trace(go.Scatter(
    x=[0.005],
    y=[-0.002],
    #visible = False,
    showlegend=True,
    mode='markers',
    marker = dict(color = 'rgb(254,227,145)', size =0.1),
    name="3"
))

fig.add_trace(go.Scatter(
    x=[0.005],
    y=[-0.002],
    #visible = False,
    showlegend=True,
    mode='markers',
    marker = dict(color = 'rgb(241,105,19)', size =0.1),
    name="4"
))

fig.add_trace(go.Scatter(
    x=[0.005],
    y=[-0.002],
    #visible = False,
    showlegend=True,
    mode='markers',
    marker = dict(color = 'rgb(227,26,28)', size =0.1),
    name="5"
))

fig.update_layout(legend= dict(itemsizing='constant', itemclick='toggleothers', bgcolor='rgb(224,243,219)'))


fig.update_layout(legend_title='<b> Bot Score </b>')

fig.show()

plotly.offline.plot(fig, filename = 'figures/USN-network-112019-unpatched.html', auto_open=False)
Out[27]:
'figures/USN-network-112019-unpatched.html'
3.3. November 2018 Hashtag Co-occurrence Network.
In [28]:
# hashtags cooccurrence frequency distributions
freq_vect = {'cluster':[],'edge':[],'frequency':[]}
for j, i in enumerate(HTGS_G_112018_C):
    for k in i.edges():
        freq_vect['cluster'].append(j)
        freq_vect['edge'].append(k)
        freq_vect['frequency'].append(i.edges[k]['frequency'])
freq_vect = pd.DataFrame(freq_vect).set_index('edge').sort_values(by='frequency',ascending=False)

# list top hashtag frequency values
print('Top edges in November 2018')
print(freq_vect.head(20))
Top edges in November 2018
                                cluster  frequency
edge                                              
(#campfire, #buttecounty)             0         86
(#campfire, #chico)                   0         77
(#chico, #buttecounty)                0         60
(#campfire, #paradise)                0         57
(#campfire, #campfirepets)            0         52
(#campfire, #magalia)                 0         39
(#campfire, #buttestrong)             0         37
(#campfire, #campfireparadise)        0         35
(#chico, #campfirepets)               0         35
(#campfirepets, #buttecounty)         0         35
(#campfire, #oroville)                0         34
(#campfirepets, #magalia)             0         34
(#campfirepets, #oroville)            0         34
(#campfire, #paradisestrong)          0         33
(#magalia, #buttecounty)              0         33
(#chico, #oroville)                   0         32
(#chico, #magalia)                    0         31
(#campfire, #paradiseca)              0         31
(#buttecounty, #paradiseca)           0         31
(#campfire, #concow)                  0         31
In [29]:
# hashtag centralities distributions
centrality_vect = {'cluster':[],'user':[],'eig':[],'bet':[],'deg':[]}
for j, i in enumerate(HTGS_G_112018_C):
    for k in i.nodes():
        centrality_vect['cluster'].append(j)
        centrality_vect['user'].append(k)
        centrality_vect['eig'].append(i.nodes[k]['centrality'])
        centrality_vect['bet'].append(i.nodes[k]['betweenness'])
        centrality_vect['deg'].append(i.nodes[k]['degree'])
centrality_vect = pd.DataFrame(centrality_vect).set_index('user')

# list top eigenvector centrality values
print('Top hashtag eigenvector centralities in November 2018 by component')
centrality_vect = centrality_vect.sort_values(by='eig',ascending=False)
print(centrality_vect.head(20))
print()
print('Top hashtag betweenness centralities in November 2018 by component')
centrality_vect = centrality_vect.sort_values(by='bet',ascending=False)
print(centrality_vect.head(20))
print()
print('Top hashtag degree centralities in November 2018 by component')
centrality_vect = centrality_vect.sort_values(by='deg',ascending=False)
print(centrality_vect.head(20))
Top hashtag eigenvector centralities in November 2018 by component
            cluster       eig       bet       deg
user                                             
#fire             0  0.240422  0.196408  0.152790
#campfire         0  0.231864  0.373480  0.232388
#wildfires        0  0.161386  0.050354  0.056725
#you              0  0.158527  0.005887  0.046661
#travel           0  0.156613  0.011003  0.045746
#camping          0  0.153292  0.002833  0.039341
#water            0  0.152552  0.019405  0.039341
#man              0  0.151214  0.016058  0.039341
#trending         0  0.150822  0.000841  0.032022
#history          0  0.150495  0.004717  0.034767
#wisdom           0  0.150079  0.000847  0.032022
#can              0  0.149820  0.001685  0.032022
#pooh             0  0.149364  0.000000  0.030192
#prevent          0  0.149364  0.000000  0.030192
#camper           0  0.149364  0.000000  0.030192
#woman            0  0.149364  0.000000  0.030192
#people           0  0.149364  0.000000  0.030192
#sage             0  0.149364  0.000000  0.030192
#honey            0  0.149364  0.000000  0.030192
#yogi             0  0.149364  0.000000  0.030192

Top hashtag betweenness centralities in November 2018 by component
                   cluster       eig       bet       deg
user                                                    
#campfire                0  0.231864  0.373480  0.232388
#fire                    0  0.240422  0.196408  0.152790
#climatechange           0  0.018847  0.156800  0.068618
#california              0  0.046910  0.079048  0.073193
#art                     0  0.054071  0.073290  0.077768
#wildfires               0  0.161386  0.050354  0.056725
#video                   0  0.013173  0.048510  0.035682
#blackfriday             0  0.007504  0.045995  0.025618
#christmas               0  0.046793  0.043839  0.054895
#trump                   0  0.001155  0.029333  0.032022
#bushfires               0  0.008268  0.028920  0.041171
#heat                    0  0.034505  0.027987  0.052150
#breaking                0  0.012745  0.025913  0.017383
#bushfire                0  0.003656  0.025683  0.030192
#holyfire                0  0.008142  0.022675  0.015554
#australia               0  0.013316  0.022120  0.039341
#writing                 0  0.012949  0.020069  0.024703
#water                   0  0.152552  0.019405  0.039341
#thursdaythoughts        0  0.023002  0.019255  0.030192
#events                  0  0.007026  0.018690  0.011894

Top hashtag degree centralities in November 2018 by component
                cluster       eig       bet       deg
user                                                 
#campfire             0  0.231864  0.373480  0.232388
#fire                 0  0.240422  0.196408  0.152790
#art                  0  0.054071  0.073290  0.077768
#flame                0  0.075597  0.011794  0.074108
#california           0  0.046910  0.079048  0.073193
#climatechange        0  0.018847  0.156800  0.068618
#campfirepets         0  0.046132  0.017086  0.066789
#buttecounty          0  0.048617  0.013943  0.063129
#wildfires            0  0.161386  0.050354  0.056725
#christmas            0  0.046793  0.043839  0.054895
#atlanta              0  0.018225  0.008424  0.052150
#heat                 0  0.034505  0.027987  0.052150
#paradise             0  0.037040  0.005535  0.048490
#you                  0  0.158527  0.005887  0.046661
#travel               0  0.156613  0.011003  0.045746
#chico                0  0.038579  0.007378  0.045746
#rap                  0  0.018601  0.018549  0.043916
#hiphop               0  0.020833  0.015011  0.043916
#bushfires            0  0.008268  0.028920  0.041171
#magalia              0  0.036443  0.001032  0.041171
In [30]:
G = HTGS_G_112018
pos = nx.spring_layout(G) # obtain positions for each node in the network 

### Interactive plot
edge_x = []
edge_y = []
for edge in G.edges():
    x0,y0 = pos[edge[0]]
    x1,y1 = pos[edge[1]]
    edge_x.append(x0)
    edge_x.append(x1)
    edge_x.append(None)
    edge_y.append(y0)
    edge_y.append(y1)
    edge_y.append(None)

# Creating a "scatter plot" of the edges
edge_trace = go.Scatter(
    x=edge_x, y=edge_y,
    line=dict(width=0.75, color='slategray'), # change the thickness and color of the edges
    hoverinfo='none',
    opacity = 0.5,
    mode='lines',
    showlegend=False)

# Creating the nodes, based on positions
node_x = []
node_y = []
for node in G.nodes():
    x, y = pos[node]
    node_x.append(x)
    node_y.append(y)

# Creating a scatter plot of the nodes
node_trace = go.Scatter(
    x=node_x, y=node_y,
    mode='markers',
    hoverinfo='text',
    marker=dict(
        #showscale=True,
        # colorscale options
        #'Greys' | 'YlGnBu' | 'Greens' | 'YlOrRd' | 'Bluered' | 'RdBu' |
        #'Reds' | 'Blues' | 'Picnic' | 'Rainbow' | 'Portland' | 'Jet' |
        #'Hot' | 'Blackbody' | 'Earth' | 'Electric' | 'Viridis' |
        #colorscale='RdYlBu',m
        colorscale = [[0,'rgb(42,35,160)'], [0.25,'rgb(29,145,192)'], [0.5,'rgb(254,227,145)'], [0.75,'rgb(241,105,19)'], [1.0, 'rgb(227,26,28)']],
        line_width=0.5),
        showlegend=False)

node_deg = []
node_text_deg = []
for node in G.nodes():
    node_deg.append(G.nodes[node]['degree'])
    node_text_deg.append('Degree: '+str(G.nodes[node]['degree']))
count1, count2 = np.unique(node_deg, return_counts=True)

node_eig = []
node_text_eig = []
node_b = []
for node in G.nodes():
    node_eig.append(round(G.nodes[node]['centrality'],4))
    node_b.append(round(G.nodes[node]['betweenness'],4))
    node_text_eig.append('Hashtag: '+str(node)+', Influence: '
                         +str(round(G.nodes[node]['centrality'],4))
                         +', Betweenness: '+str(round(G.nodes[node]['betweenness'],4))
                         +', Degree: '+str(round(G.nodes[node]['degree'],4)))

node_deg2 = [10 + i*100 for i in node_deg]
node_trace.marker.size = node_deg2
node_trace.text = node_text_eig

text = 'Plot of the largest connected subnetwork, <br>\
        eigenvector centality (proportional to the size of node), <br>\
        betweeness centrality (hover mouse), <br>\
        and degree centrality (hover mouse).'

#Creating the figure 
fig = go.Figure(data = [edge_trace, node_trace],
             layout=go.Layout(
                title='Hashtag Network (November 2018)',
                titlefont_size=24,
                showlegend=True,
                plot_bgcolor = 'rgb(224,243,219)',
                hovermode='closest',
                margin=dict(b=20,l=5,r=5,t=40),
                annotations=[ dict(
                    text=text,
                    showarrow=False,
                    xref="paper", yref="paper",
                    align="left",
                    x=0.005, y=-0.002 ) ],
                xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
                yaxis=dict(showgrid=False, zeroline=False, showticklabels=False))
                )

fig.update_layout(legend= dict(itemsizing='constant', itemclick='toggleothers', bgcolor='rgb(224,243,219)'))

fig.show()

plotly.offline.plot(fig, filename = 'figures/HTGS-network-112018.html', auto_open=False)
Out[30]:
'figures/HTGS-network-112018.html'
3.4. November 2019 Hashtag Co-occurrence Network.
In [31]:
# hashtags cooccurrence frequency distributions
freq_vect = {'cluster':[],'edge':[],'frequency':[]}
for j, i in enumerate(HTGS_G_112019_C):
    for k in i.edges():
        freq_vect['cluster'].append(j)
        freq_vect['edge'].append(k)
        freq_vect['frequency'].append(i.edges[k]['frequency'])
freq_vect = pd.DataFrame(freq_vect).set_index('edge').sort_values(by='frequency',ascending=False)

# list top hashtag frequency values
print('Top edges in November 2019')
print(freq_vect.head(20))
Top edges in November 2019
                                  cluster  frequency
edge                                                
(#7news, #bushfires)                    0         25
(#campfire, #california)                0         15
(#campfire, #cawx)                      0         13
(#australia, #bushfires)                0          8
(#auspol, #climateemergency)            0          8
(#breaking, #campfire)                  0          7
(#icac, #corruptionkills)               0          7
(#climateemergency, #bushfires)         0          7
(#auspol, #bushfires)                   0          7
(#auspol19, #icac)                      0          7
(#icac, #doubledissolution)             0          7
(#icac, #generalstrike)                 0          7
(#icac, #bushfires)                     0          7
(#icac, #climateemergency)              0          7
(#rap, #hiphop)                         0          7
(#bushfires, #climatechange)            0          7
(#icac, #droughtemergency)              0          7
(#blackfriday, #blackfriday2019)        0          6
(#giveaway, #bonfire)                   0          6
(#auspol, #climatechange)               0          6
In [32]:
# hashtag centralities distributions
centrality_vect = {'cluster':[],'user':[],'eig':[],'bet':[],'deg':[]}
for j, i in enumerate(HTGS_G_112019_C):
    for k in i.nodes():
        centrality_vect['cluster'].append(j)
        centrality_vect['user'].append(k)
        centrality_vect['eig'].append(i.nodes[k]['centrality'])
        centrality_vect['bet'].append(i.nodes[k]['betweenness'])
        centrality_vect['deg'].append(i.nodes[k]['degree'])
centrality_vect = pd.DataFrame(centrality_vect).set_index('user')

# list top eigenvector centrality values
print('Top hashtag eigenvector centralities in November 2019 by component')
centrality_vect = centrality_vect.sort_values(by='eig',ascending=False)
print(centrality_vect.head(20))
print()
print('Top hashtag betweenness centralities in November 2019 by component')
centrality_vect = centrality_vect.sort_values(by='bet',ascending=False)
print(centrality_vect.head(20))
print()
print('Top hashtag degree centralities in November 2019 by component')
centrality_vect = centrality_vect.sort_values(by='deg',ascending=False)
print(centrality_vect.head(20))
Top hashtag eigenvector centralities in November 2019 by component
                 cluster       eig       bet       deg
user                                                  
#fire                  0  0.295370  0.096517  0.099558
#hiphop                0  0.176294  0.023897  0.057522
#rap                   0  0.176185  0.026072  0.057522
#heat                  0  0.168089  0.043737  0.067478
#music                 0  0.159386  0.012303  0.039823
#soundcloud            0  0.158853  0.013621  0.049779
#artist                0  0.151231  0.010433  0.037611
#texas                 0  0.143598  0.009098  0.030973
#producer              0  0.143412  0.002601  0.028761
#work                  0  0.137909  0.004429  0.026549
#satxartist            0  0.137377  0.000000  0.024336
#undergroundrap        0  0.137377  0.000000  0.024336
#radioplay             0  0.137377  0.000000  0.024336
#promoter              0  0.137377  0.000000  0.024336
#koe                   0  0.137377  0.000000  0.024336
#independent           0  0.137377  0.000000  0.024336
#spotify               0  0.137377  0.000000  0.024336
#satx                  0  0.137377  0.000000  0.024336
#rapper                0  0.137377  0.000000  0.024336
#upcomingrap           0  0.137377  0.000000  0.024336

Top hashtag betweenness centralities in November 2019 by component
                cluster       eig       bet       deg
user                                                 
#blackfriday          0  0.086459  0.274672  0.183628
#climatechange        0  0.011427  0.128975  0.064159
#campfire             0  0.033229  0.098988  0.055310
#fire                 0  0.295370  0.096517  0.099558
#bushfires            0  0.011559  0.071295  0.070796
#amazon               0  0.030508  0.059839  0.038717
#heat                 0  0.168089  0.043737  0.067478
#art                  0  0.048184  0.042057  0.046460
#australia            0  0.030337  0.039499  0.035398
#auspol               0  0.006022  0.033083  0.049779
#wildfires            0  0.028919  0.032120  0.023230
#us                   0  0.008771  0.028129  0.027655
#rap                  0  0.176185  0.026072  0.057522
#trump                0  0.001195  0.025917  0.022124
#qanon                0  0.000613  0.024166  0.013274
#cdnpoli              0  0.006609  0.024020  0.023230
#hiphop               0  0.176294  0.023897  0.057522
#cats                 0  0.006892  0.022130  0.022124
#maga                 0  0.000627  0.020614  0.018805
#water                0  0.029222  0.019569  0.022124

Top hashtag degree centralities in November 2019 by component
                   cluster       eig       bet       deg
user                                                    
#blackfriday             0  0.086459  0.274672  0.183628
#fire                    0  0.295370  0.096517  0.099558
#bushfires               0  0.011559  0.071295  0.070796
#heat                    0  0.168089  0.043737  0.067478
#climatechange           0  0.011427  0.128975  0.064159
#rap                     0  0.176185  0.026072  0.057522
#hiphop                  0  0.176294  0.023897  0.057522
#campfire                0  0.033229  0.098988  0.055310
#soundcloud              0  0.158853  0.013621  0.049779
#auspol                  0  0.006022  0.033083  0.049779
#art                     0  0.048184  0.042057  0.046460
#climateemergency        0  0.006070  0.013023  0.042035
#music                   0  0.159386  0.012303  0.039823
#amazon                  0  0.030508  0.059839  0.038717
#artist                  0  0.151231  0.010433  0.037611
#australia               0  0.030337  0.039499  0.035398
#trending                0  0.050909  0.016233  0.034292
#digitalart              0  0.012465  0.004660  0.033186
#texas                   0  0.143598  0.009098  0.030973
#christmas               0  0.049720  0.016904  0.030973
In [33]:
G = HTGS_G_112019
pos = nx.spring_layout(G) # obtain positions for each node in the network 

### Interactive plot
edge_x = []
edge_y = []
for edge in G.edges():
    x0,y0 = pos[edge[0]]
    x1,y1 = pos[edge[1]]
    edge_x.append(x0)
    edge_x.append(x1)
    edge_x.append(None)
    edge_y.append(y0)
    edge_y.append(y1)
    edge_y.append(None)

# Creating a "scatter plot" of the edges
edge_trace = go.Scatter(
    x=edge_x, y=edge_y,
    line=dict(width=0.75, color='slategray'), # change the thickness and color of the edges
    hoverinfo='none',
    opacity = 0.5,
    mode='lines',
    showlegend=False)

# Creating the nodes, based on positions
node_x = []
node_y = []
for node in G.nodes():
    x, y = pos[node]
    node_x.append(x)
    node_y.append(y)

# Creating a scatter plot of the nodes
node_trace = go.Scatter(
    x=node_x, y=node_y,
    mode='markers',
    hoverinfo='text',
    marker=dict(
        #showscale=True,
        # colorscale options
        #'Greys' | 'YlGnBu' | 'Greens' | 'YlOrRd' | 'Bluered' | 'RdBu' |
        #'Reds' | 'Blues' | 'Picnic' | 'Rainbow' | 'Portland' | 'Jet' |
        #'Hot' | 'Blackbody' | 'Earth' | 'Electric' | 'Viridis' |
        #colorscale='RdYlBu',m
        colorscale = [[0,'rgb(42,35,160)'], [0.25,'rgb(29,145,192)'], [0.5,'rgb(254,227,145)'], [0.75,'rgb(241,105,19)'], [1.0, 'rgb(227,26,28)']],
        line_width=0.5),
        showlegend=False)

node_deg = []
node_text_deg = []
for node in G.nodes():
    node_deg.append(G.nodes[node]['degree'])
    node_text_deg.append('Degree: '+str(G.nodes[node]['degree']))
count1, count2 = np.unique(node_deg, return_counts=True)

node_eig = []
node_text_eig = []
node_b = []
for node in G.nodes():
    node_eig.append(round(G.nodes[node]['centrality'],4))
    node_b.append(round(G.nodes[node]['betweenness'],4))
    node_text_eig.append('Hashtag: '+str(node)+', Influence: '
                         +str(round(G.nodes[node]['centrality'],4))
                         +', Betweenness: '+str(round(G.nodes[node]['betweenness'],4))
                         +', Degree: '+str(round(G.nodes[node]['degree'],4)))
    
node_deg2 = [10 + i*100 for i in node_deg]
node_trace.marker.size = node_deg2
node_trace.text = node_text_eig

text = 'Plot of the largest connected subnetwork, <br>\
        eigenvector centality (proportional to the size of node), <br>\
        betweeness centrality (hover mouse), <br>\
        and degree centrality (hover mouse).'

#Creating the figure 
fig = go.Figure(data = [edge_trace, node_trace],
             layout=go.Layout(
                title='Hashtag Network (November 2019)',
                titlefont_size=24,
                showlegend=True,
                plot_bgcolor = 'rgb(224,243,219)',
                hovermode='closest',
                margin=dict(b=20,l=5,r=5,t=40),
                annotations=[ dict(
                    text=text,
                    showarrow=False,
                    xref="paper", yref="paper",
                    align="left",
                    x=0.005, y=-0.002 ) ],
                xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
                yaxis=dict(showgrid=False, zeroline=False, showticklabels=False))
                )

fig.update_layout(legend= dict(itemsizing='constant', itemclick='toggleothers', bgcolor='rgb(224,243,219)'))

fig.show()

plotly.offline.plot(fig, filename = 'figures/HTGS-network-112019.html', auto_open=False)
Out[33]:
'figures/HTGS-network-112019.html'